• We built Thematic maps using
{ggplot2}.
• But, how can we use external data from other GIS software?
• What is the standard file format to store and share data?
• Today we are going to read and write Shapefiles, and
• Dive into the components of a sf
object!
Read Spatial data from Shapefiles using the
read_sf() function from the {sf}
package.
Identify the components of sf
objects.
Identify the components of Shapefiles.
Write Spatial data in Shapefiles using
write_sf().
This lesson requires the following packages:
if(!require('pacman')) install.packages('pacman')
pacman::p_load(rnaturalearth,
ggplot2,
cholera,
here,
sf)
pacman::p_load_gh("afrimapr/afrilearndata",
"wmgeolab/rgeoboundaries")• The most common data format for storing Spatial data.
• Using read_sf() from {sf}.
• From local files with a .shp
extension,
• To a ready-to-use sf object.
• Let’s read the sle_adm3.shp file:
.shp
filename:read_sf() within
here():## Error: <text>:1:16: unexpected input
## 1: shape_file <- __
## ^
• The output is an sf object and can be plotted using
geom_sf():
Read the shapefile called sle_hf.shp inside the
data/healthsites/ folder. Use the read_sf()
function:
• Wait! Shapefiles do not come alone!
• They came with a list of sub-component files.
• Let’s check at the files in the data/boundaries/
folder:
## # A tibble: 4 × 1
## value
## <chr>
## 1 sle_adm3.dbf
## 2 sle_adm3.prj
## 3 sle_adm3.shp
## 4 sle_adm3.shx
• How are these files related with the
sf object?
• Let’s now look under the hood to understand sf objects
better.
sf objects• "sf" stands for Simple
Features, a set of widely-used standards for storing geospatial
information in databases.
• Now, what do sf objects look like and how do we work
with them?
• We’ll look at a slice of the countries object:
• sf is a special kind of data
frame
• We can manipulate it with {tidyverse} functions like
dplyr::select().
• Let’s select three columns:
## Simple feature collection with 177 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## Geodetic CRS: WGS 84
## First 10 features:
## name pop_est geometry
## 1 Fiji 889953 MULTIPOLYGON (((180 -16.067...
## 2 Tanzania 58005463 MULTIPOLYGON (((33.90371 -0...
## 3 W. Sahara 603253 MULTIPOLYGON (((-8.66559 27...
## 4 Canada 37589262 MULTIPOLYGON (((-122.84 49,...
## 5 United States of America 328239523 MULTIPOLYGON (((-122.84 49,...
## 6 Kazakhstan 18513930 MULTIPOLYGON (((87.35997 49...
## 7 Uzbekistan 33580650 MULTIPOLYGON (((55.96819 41...
## 8 Papua New Guinea 8776109 MULTIPOLYGON (((141.0002 -2...
## 9 Indonesia 270625568 MULTIPOLYGON (((141.0002 -2...
## 10 Argentina 44938712 MULTIPOLYGON (((-68.63401 -...
• What do we see?
• 5-line header and a data frame.
sf
header• The header provides context about the rest of the object.
• Let’s go through the most relevant sections:
• Tells you the number of features and
fields in the sf object:
👉Simple feature collection with 177 features and 2 fields👈
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: +proj=longlat +datum=WGS84
• Features are the row of the data frame.
• In our countries dataset, each country is a
feature.
• Fields are the Attributes of each Feature.
• Equivalent to columns, not counting the “geometry” column.
The spData::nz dataset contains mapping information for
the regions of New Zealand. How many features and fields does the
dataset have?
• Gives you the type of geometry in the sf
object:
Simple feature collection with 177 features and 2 fields
👉Geometry type: MULTIPOLYGON👈
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
Geodetic CRS: +proj=longlat +datum=WGS84
• Geometry is a synonym for “shape”.
• Three main geometry types: points, lines and polygons.
• Each has its respective “multi” version: multipoints, multilines and multipolygons.
The ne_download() function from {rnaturalearth} can be
used to obtain a map of major world roads, using the code below:
roads <-
ne_download(scale = 10,
category = "physical",
type = "geographic_lines",
returnclass = "sf") ◘ What type of geometry is used to represent the rivers?
• Each individual sf object can only
contain one geometry type
• All points, all lines or all polygons.
• You will not find a mixture of geometries in a single sf object.
• It is related with the geometry column of the
sf dataframe
• The geometry column is the most special property of
the sf data frame.
• It holds the core geospatial data (points, linestrings or polygons).
👉Geometry type: MULTIPOLYGON👈
First 10 features:
👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇👇
name pop_est geometry
0 Afghanistan 28400000 MULTIPOLYGON (((61.21082 35...
1 Angola 12799293 MULTIPOLYGON (((16.32653 -5...
2 Albania 3639453 MULTIPOLYGON (((20.59025 41...
3 United Arab Emirates 4798491 MULTIPOLYGON (((51.57952 24...
4 Argentina 40913584 MULTIPOLYGON (((-65.5 -55.2...
5 Armenia 2967004 MULTIPOLYGON (((43.58275 41...
6 Antarctica 3802 MULTIPOLYGON (((-59.57209 -...
7 Fr. S. Antarctic Lands 140 MULTIPOLYGON (((68.935 -48....
8 Australia 21262641 MULTIPOLYGON (((145.398 -40...
9 Austria 8210281 MULTIPOLYGON (((16.97967 48...• Some noteworthy points about this column:
• The geometry column can’t be dropped,
• geom_sf() automatically recognizes the geometry
column.
• Tells us the Coordinate Reference System (CRS) used.
Simple feature collection with 177 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
👉Geodetic CRS: +proj=longlat +datum=WGS84👈
• CRS relate the spatial elements of the data with the surface of Earth.
• CRS are a key component of geographic objects.
• We will cover them in detail later!
• They are a collection of files,
• At least three files: .shp, .shx, and
.dbf.
• Related with components of a sf
object.
• Let’s see the component files of a Shapefile called
sle_adm3.shp.
• All of them are located in the same
data/boundaries/ folder:
## # A tibble: 4 × 1
## value
## <chr>
## 1 sle_adm3.dbf
## 2 sle_adm3.prj
## 3 sle_adm3.shp
## 4 sle_adm3.shx
• What is inside each file?
.shp: contains the
Geometry data,.dbf: stores the Attributes (Fields)
for each shape..shx: is a positional index that
links each Geometry with its Attributes,.prj: plain text file describing the
CRS, including the Map
Projection,• These files can be compressed into a ZIP folder and shared!
All of these sub-component files must be present in a given directory (folder) for the shapefile to be readable.
Which of the following options of component files of Shapefiles:
"shp""shx""dbf"contains the Geometry data?
stores the Attributes for each shape?
• Let’s write the countries object to an
countries.shp file:
.shp
filename:write_sf() within
here():## Error: <text>:1:16: unexpected input
## 1: countries %>% __
## ^
• Now, all the components of a sf
object are in four new files of one
Shapefile:
## # A tibble: 5 × 1
## value
## <chr>
## 1 countries.dbf
## 2 countries.prj
## 3 countries.shp
## 4 countries.shx
## 5 ignore.md
• We read and write
Shapefiles using the {sf} package,
• Identified the components of an
sf object, and
• Their relation with the files within a Shapefile.
• Now we need to dive into CRS’s.
• Learn how to manage their zoom and transform them!
• Follow along with the lessons to find how to train these skills!
The following team members contributed to this lesson:
Some material in this lesson was adapted from the following sources:
Seimon, Dilinie. Administrative Boundaries. (2021). Retrieved 15 April 2022, from https://rspatialdata.github.io/admin_boundaries.html
Varsha Ujjinni Vijay Kumar. Malaria. (2021). Retrieved 15 April 2022, from https://rspatialdata.github.io/malaria.html
Batra, Neale, et al. The Epidemiologist R Handbook. Chapter 28: GIS Basics. (2021). Retrieved 01 April 2022, from https://epirhandbook.com/en/gis-basics.html
Lovelace, R., Nowosad, J., & Muenchow, J. Geocomputation with R. Chapter 2: Geographic data in R. (2019). Retrieved 01 April 2022, from https://geocompr.robinlovelace.net/spatial-class.html
Moraga, Paula. Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapter 2: Spatial data and R packages for mapping. (2019). Retrieved 01 April 2022, from https://www.paulamoraga.com/book-geospatial/sec-spatialdataandCRS.html
This work is licensed under the Creative Commons Attribution Share Alike license.
## Simple feature collection with 44 features and 11 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -13.26473 ymin: 8.358015 xmax: -13.0821 ymax: 8.490478
## Geodetic CRS: WGS 84
## # A tibble: 44 × 12
## osm_id source addrfull building healthcare operatorty addrcity name amenity
## <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 3.20e9 UNMEE… <NA> <NA> <NA> <NA> <NA> Chin… hospit…
## 2 3.20e9 UNMEE… <NA> <NA> <NA> <NA> <NA> Lakk… hospit…
## 3 3.21e9 https… <NA> <NA> <NA> <NA> <NA> Prin… hospit…
## 4 3.34e9 MSF-CH <NA> <NA> <NA> <NA> <NA> COMM… clinic
## 5 3.34e9 MSF-CH <NA> <NA> <NA> <NA> <NA> Den … clinic
## 6 3.34e9 <NA> <NA> <NA> <NA> <NA> <NA> MABE… clinic
## 7 3.34e9 MSF-CH <NA> <NA> <NA> <NA> <NA> MAYE… clinic
## 8 3.34e9 MSF-CH <NA> <NA> <NA> <NA> <NA> HEAL… clinic
## 9 3.34e9 <NA> <NA> <NA> <NA> <NA> <NA> Dent… dentist
## 10 3.34e9 MSF-CH <NA> <NA> <NA> <NA> <NA> GINE… clinic
## # ℹ 34 more rows
## # ℹ 3 more variables: healthca_1 <chr>, capacitype <chr>, geometry <POINT [°]>
The spData::nz dataset contains mapping information for
the regions of New Zealand. How many features and fields does the
dataset have?